HW 1

Author

Diana Tang

Libraries

library(leaflet)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(data.table)

Attaching package: 'data.table'

The following objects are masked from 'package:lubridate':

    hour, isoweek, mday, minute, month, quarter, second, wday, week,
    yday, year

The following objects are masked from 'package:dplyr':

    between, first, last

The following object is masked from 'package:purrr':

    transpose
library(lubridate)
library(dplyr)

Step 1: Read in data and check it

#Read in the EPA data from 2002
twozero <- fread("2002.csv")
#Read in the EPA data from 2022
twotwo <- fread("2022.csv")

Step 1a. Check the dimensions

dim(twozero)
[1] 15976    20

The dimensions of twozero, the 2002 data, is 15,976 rows/observations by 20 columns.

dim(twotwo)
[1] 56140    20

The dimensions of twotwo, the 2022 data, is 56,140 rows/observations by 20 columns.

Step 1b. Take a look at the variables

str(twozero)
Classes 'data.table' and 'data.frame':  15976 obs. of  20 variables:
 $ Date                          : chr  "01/05/2002" "01/06/2002" "01/08/2002" "01/11/2002" ...
 $ Source                        : chr  "AQS" "AQS" "AQS" "AQS" ...
 $ Site ID                       : int  60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 ...
 $ POC                           : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Daily Mean PM2.5 Concentration: num  25.1 31.6 21.4 25.9 34.5 41 29.3 15 18.8 37.9 ...
 $ UNITS                         : chr  "ug/m3 LC" "ug/m3 LC" "ug/m3 LC" "ug/m3 LC" ...
 $ DAILY_AQI_VALUE               : int  78 92 71 80 98 115 87 57 65 107 ...
 $ Site Name                     : chr  "Livermore" "Livermore" "Livermore" "Livermore" ...
 $ DAILY_OBS_COUNT               : int  1 1 1 1 1 1 1 1 1 1 ...
 $ PERCENT_COMPLETE              : num  100 100 100 100 100 100 100 100 100 100 ...
 $ AQS_PARAMETER_CODE            : int  88101 88101 88101 88101 88101 88101 88101 88101 88101 88101 ...
 $ AQS_PARAMETER_DESC            : chr  "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" ...
 $ CBSA_CODE                     : int  41860 41860 41860 41860 41860 41860 41860 41860 41860 41860 ...
 $ CBSA_NAME                     : chr  "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" ...
 $ STATE_CODE                    : int  6 6 6 6 6 6 6 6 6 6 ...
 $ STATE                         : chr  "California" "California" "California" "California" ...
 $ COUNTY_CODE                   : int  1 1 1 1 1 1 1 1 1 1 ...
 $ COUNTY                        : chr  "Alameda" "Alameda" "Alameda" "Alameda" ...
 $ SITE_LATITUDE                 : num  37.7 37.7 37.7 37.7 37.7 ...
 $ SITE_LONGITUDE                : num  -122 -122 -122 -122 -122 ...
 - attr(*, ".internal.selfref")=<externalptr> 
str(twotwo)
Classes 'data.table' and 'data.frame':  56140 obs. of  20 variables:
 $ Date                          : chr  "01/01/2022" "01/02/2022" "01/03/2022" "01/04/2022" ...
 $ Source                        : chr  "AQS" "AQS" "AQS" "AQS" ...
 $ Site ID                       : int  60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 ...
 $ POC                           : int  3 3 3 3 3 3 3 3 3 3 ...
 $ Daily Mean PM2.5 Concentration: num  12.7 13.9 7.1 3.7 4.2 3.8 2.3 6.9 13.6 11.2 ...
 $ UNITS                         : chr  "ug/m3 LC" "ug/m3 LC" "ug/m3 LC" "ug/m3 LC" ...
 $ DAILY_AQI_VALUE               : int  52 55 30 15 18 16 10 29 54 47 ...
 $ Site Name                     : chr  "Livermore" "Livermore" "Livermore" "Livermore" ...
 $ DAILY_OBS_COUNT               : int  1 1 1 1 1 1 1 1 1 1 ...
 $ PERCENT_COMPLETE              : num  100 100 100 100 100 100 100 100 100 100 ...
 $ AQS_PARAMETER_CODE            : int  88101 88101 88101 88101 88101 88101 88101 88101 88101 88101 ...
 $ AQS_PARAMETER_DESC            : chr  "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" ...
 $ CBSA_CODE                     : int  41860 41860 41860 41860 41860 41860 41860 41860 41860 41860 ...
 $ CBSA_NAME                     : chr  "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" ...
 $ STATE_CODE                    : int  6 6 6 6 6 6 6 6 6 6 ...
 $ STATE                         : chr  "California" "California" "California" "California" ...
 $ COUNTY_CODE                   : int  1 1 1 1 1 1 1 1 1 1 ...
 $ COUNTY                        : chr  "Alameda" "Alameda" "Alameda" "Alameda" ...
 $ SITE_LATITUDE                 : num  37.7 37.7 37.7 37.7 37.7 ...
 $ SITE_LONGITUDE                : num  -122 -122 -122 -122 -122 ...
 - attr(*, ".internal.selfref")=<externalptr> 
head(twozero)
         Date Source  Site ID POC Daily Mean PM2.5 Concentration    UNITS
1: 01/05/2002    AQS 60010007   1                           25.1 ug/m3 LC
2: 01/06/2002    AQS 60010007   1                           31.6 ug/m3 LC
3: 01/08/2002    AQS 60010007   1                           21.4 ug/m3 LC
4: 01/11/2002    AQS 60010007   1                           25.9 ug/m3 LC
5: 01/14/2002    AQS 60010007   1                           34.5 ug/m3 LC
6: 01/17/2002    AQS 60010007   1                           41.0 ug/m3 LC
   DAILY_AQI_VALUE Site Name DAILY_OBS_COUNT PERCENT_COMPLETE
1:              78 Livermore               1              100
2:              92 Livermore               1              100
3:              71 Livermore               1              100
4:              80 Livermore               1              100
5:              98 Livermore               1              100
6:             115 Livermore               1              100
   AQS_PARAMETER_CODE       AQS_PARAMETER_DESC CBSA_CODE
1:              88101 PM2.5 - Local Conditions     41860
2:              88101 PM2.5 - Local Conditions     41860
3:              88101 PM2.5 - Local Conditions     41860
4:              88101 PM2.5 - Local Conditions     41860
5:              88101 PM2.5 - Local Conditions     41860
6:              88101 PM2.5 - Local Conditions     41860
                           CBSA_NAME STATE_CODE      STATE COUNTY_CODE  COUNTY
1: San Francisco-Oakland-Hayward, CA          6 California           1 Alameda
2: San Francisco-Oakland-Hayward, CA          6 California           1 Alameda
3: San Francisco-Oakland-Hayward, CA          6 California           1 Alameda
4: San Francisco-Oakland-Hayward, CA          6 California           1 Alameda
5: San Francisco-Oakland-Hayward, CA          6 California           1 Alameda
6: San Francisco-Oakland-Hayward, CA          6 California           1 Alameda
   SITE_LATITUDE SITE_LONGITUDE
1:      37.68753      -121.7842
2:      37.68753      -121.7842
3:      37.68753      -121.7842
4:      37.68753      -121.7842
5:      37.68753      -121.7842
6:      37.68753      -121.7842
tail(twozero)
         Date Source  Site ID POC Daily Mean PM2.5 Concentration    UNITS
1: 12/10/2002    AQS 61131003   1                             15 ug/m3 LC
2: 12/13/2002    AQS 61131003   1                             15 ug/m3 LC
3: 12/22/2002    AQS 61131003   1                              1 ug/m3 LC
4: 12/25/2002    AQS 61131003   1                             23 ug/m3 LC
5: 12/28/2002    AQS 61131003   1                              5 ug/m3 LC
6: 12/31/2002    AQS 61131003   1                              6 ug/m3 LC
   DAILY_AQI_VALUE            Site Name DAILY_OBS_COUNT PERCENT_COMPLETE
1:              57 Woodland-Gibson Road               1              100
2:              57 Woodland-Gibson Road               1              100
3:               4 Woodland-Gibson Road               1              100
4:              74 Woodland-Gibson Road               1              100
5:              21 Woodland-Gibson Road               1              100
6:              25 Woodland-Gibson Road               1              100
   AQS_PARAMETER_CODE       AQS_PARAMETER_DESC CBSA_CODE
1:              88101 PM2.5 - Local Conditions     40900
2:              88101 PM2.5 - Local Conditions     40900
3:              88101 PM2.5 - Local Conditions     40900
4:              88101 PM2.5 - Local Conditions     40900
5:              88101 PM2.5 - Local Conditions     40900
6:              88101 PM2.5 - Local Conditions     40900
                                 CBSA_NAME STATE_CODE      STATE COUNTY_CODE
1: Sacramento--Roseville--Arden-Arcade, CA          6 California         113
2: Sacramento--Roseville--Arden-Arcade, CA          6 California         113
3: Sacramento--Roseville--Arden-Arcade, CA          6 California         113
4: Sacramento--Roseville--Arden-Arcade, CA          6 California         113
5: Sacramento--Roseville--Arden-Arcade, CA          6 California         113
6: Sacramento--Roseville--Arden-Arcade, CA          6 California         113
   COUNTY SITE_LATITUDE SITE_LONGITUDE
1:   Yolo      38.66121      -121.7327
2:   Yolo      38.66121      -121.7327
3:   Yolo      38.66121      -121.7327
4:   Yolo      38.66121      -121.7327
5:   Yolo      38.66121      -121.7327
6:   Yolo      38.66121      -121.7327
head(twotwo)
         Date Source  Site ID POC Daily Mean PM2.5 Concentration    UNITS
1: 01/01/2022    AQS 60010007   3                           12.7 ug/m3 LC
2: 01/02/2022    AQS 60010007   3                           13.9 ug/m3 LC
3: 01/03/2022    AQS 60010007   3                            7.1 ug/m3 LC
4: 01/04/2022    AQS 60010007   3                            3.7 ug/m3 LC
5: 01/05/2022    AQS 60010007   3                            4.2 ug/m3 LC
6: 01/06/2022    AQS 60010007   3                            3.8 ug/m3 LC
   DAILY_AQI_VALUE Site Name DAILY_OBS_COUNT PERCENT_COMPLETE
1:              52 Livermore               1              100
2:              55 Livermore               1              100
3:              30 Livermore               1              100
4:              15 Livermore               1              100
5:              18 Livermore               1              100
6:              16 Livermore               1              100
   AQS_PARAMETER_CODE       AQS_PARAMETER_DESC CBSA_CODE
1:              88101 PM2.5 - Local Conditions     41860
2:              88101 PM2.5 - Local Conditions     41860
3:              88101 PM2.5 - Local Conditions     41860
4:              88101 PM2.5 - Local Conditions     41860
5:              88101 PM2.5 - Local Conditions     41860
6:              88101 PM2.5 - Local Conditions     41860
                           CBSA_NAME STATE_CODE      STATE COUNTY_CODE  COUNTY
1: San Francisco-Oakland-Hayward, CA          6 California           1 Alameda
2: San Francisco-Oakland-Hayward, CA          6 California           1 Alameda
3: San Francisco-Oakland-Hayward, CA          6 California           1 Alameda
4: San Francisco-Oakland-Hayward, CA          6 California           1 Alameda
5: San Francisco-Oakland-Hayward, CA          6 California           1 Alameda
6: San Francisco-Oakland-Hayward, CA          6 California           1 Alameda
   SITE_LATITUDE SITE_LONGITUDE
1:      37.68753      -121.7842
2:      37.68753      -121.7842
3:      37.68753      -121.7842
4:      37.68753      -121.7842
5:      37.68753      -121.7842
6:      37.68753      -121.7842
tail(twotwo)
         Date Source  Site ID POC Daily Mean PM2.5 Concentration    UNITS
1: 12/01/2022    AQS 61131003   1                            3.4 ug/m3 LC
2: 12/07/2022    AQS 61131003   1                            3.8 ug/m3 LC
3: 12/13/2022    AQS 61131003   1                            6.0 ug/m3 LC
4: 12/19/2022    AQS 61131003   1                           34.8 ug/m3 LC
5: 12/25/2022    AQS 61131003   1                           23.2 ug/m3 LC
6: 12/31/2022    AQS 61131003   1                            1.0 ug/m3 LC
   DAILY_AQI_VALUE            Site Name DAILY_OBS_COUNT PERCENT_COMPLETE
1:              14 Woodland-Gibson Road               1              100
2:              16 Woodland-Gibson Road               1              100
3:              25 Woodland-Gibson Road               1              100
4:              99 Woodland-Gibson Road               1              100
5:              74 Woodland-Gibson Road               1              100
6:               4 Woodland-Gibson Road               1              100
   AQS_PARAMETER_CODE       AQS_PARAMETER_DESC CBSA_CODE
1:              88101 PM2.5 - Local Conditions     40900
2:              88101 PM2.5 - Local Conditions     40900
3:              88101 PM2.5 - Local Conditions     40900
4:              88101 PM2.5 - Local Conditions     40900
5:              88101 PM2.5 - Local Conditions     40900
6:              88101 PM2.5 - Local Conditions     40900
                                 CBSA_NAME STATE_CODE      STATE COUNTY_CODE
1: Sacramento--Roseville--Arden-Arcade, CA          6 California         113
2: Sacramento--Roseville--Arden-Arcade, CA          6 California         113
3: Sacramento--Roseville--Arden-Arcade, CA          6 California         113
4: Sacramento--Roseville--Arden-Arcade, CA          6 California         113
5: Sacramento--Roseville--Arden-Arcade, CA          6 California         113
6: Sacramento--Roseville--Arden-Arcade, CA          6 California         113
   COUNTY SITE_LATITUDE SITE_LONGITUDE
1:   Yolo      38.66121      -121.7327
2:   Yolo      38.66121      -121.7327
3:   Yolo      38.66121      -121.7327
4:   Yolo      38.66121      -121.7327
5:   Yolo      38.66121      -121.7327
6:   Yolo      38.66121      -121.7327

Step 1c. Checking the key variable Daily Mean PM 2.5

table(is.na(twozero$`Daily Mean PM2.5 Concentration`))

FALSE 
15976 
table(is.na(twotwo$`Daily Mean PM2.5 Concentration`))

FALSE 
56140 

Step 1d. Check the summaries

summary(twozero$`Daily Mean PM2.5 Concentration`)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    7.00   12.00   16.12   20.50  104.30 
summary(twotwo$`Daily Mean PM2.5 Concentration`)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  -2.20    4.20    6.90    8.52   10.80  302.50 
table(is.na(twotwo))

  FALSE    TRUE 
1118601    4199 
table(is.na(twozero))

 FALSE   TRUE 
318591    929 

Summary findings: After importing in the data from the 2002 and 2022 data sets there were 4199 hits of NA’s and 929 hits of NA. Looking at our key variables of Daily Mean PM2.5 concentration, in 2002 there was a minimum concentration of 0.00 and max concentration of 104.30 with a median of 16.12 and average of 16.12. In 2022 the minimum was -2.20 and max of 302.50 which is a significant difference than 2002. However the median was 6.90 and mean of 10.80 which was an overall decreased compared to 2002.

Step 2. Combine data

Step 2a. New year column identifier

twozero$year <- 2002
twotwo$year <- 2022

Step 2b. Renaming key variables for 2002 and 2022 data sets

twozero$PM2.5 <- twozero$`Daily Mean PM2.5 Concentration`
twozero$`Daily Mean PM2.5 Concentration` <- NULL
twozero$lat <- twozero$SITE_LATITUDE
twozero$SITE_LATITUDE <- NULL
twozero$lon <- twozero$SITE_LONGITUDE
twozero$SITE_LONGITUDE <- NULL
twotwo$PM2.5 <- twotwo$`Daily Mean PM2.5 Concentration`
twotwo$`Daily Mean PM2.5 Concentration` <- NULL
twotwo$lat <- twotwo$SITE_LATITUDE
twotwo$SITE_LATITUDE <- NULL
twotwo$lon <- twotwo$SITE_LONGITUDE
twotwo$SITE_LONGITUDE <- NULL

Step 2c. Merge the data

combined_data <- rbind(twozero,twotwo)
print(combined_data)
             Date Source  Site ID POC    UNITS DAILY_AQI_VALUE
    1: 01/05/2002    AQS 60010007   1 ug/m3 LC              78
    2: 01/06/2002    AQS 60010007   1 ug/m3 LC              92
    3: 01/08/2002    AQS 60010007   1 ug/m3 LC              71
    4: 01/11/2002    AQS 60010007   1 ug/m3 LC              80
    5: 01/14/2002    AQS 60010007   1 ug/m3 LC              98
   ---                                                        
72112: 12/07/2022    AQS 61131003   1 ug/m3 LC              16
72113: 12/13/2022    AQS 61131003   1 ug/m3 LC              25
72114: 12/19/2022    AQS 61131003   1 ug/m3 LC              99
72115: 12/25/2022    AQS 61131003   1 ug/m3 LC              74
72116: 12/31/2022    AQS 61131003   1 ug/m3 LC               4
                  Site Name DAILY_OBS_COUNT PERCENT_COMPLETE AQS_PARAMETER_CODE
    1:            Livermore               1              100              88101
    2:            Livermore               1              100              88101
    3:            Livermore               1              100              88101
    4:            Livermore               1              100              88101
    5:            Livermore               1              100              88101
   ---                                                                         
72112: Woodland-Gibson Road               1              100              88101
72113: Woodland-Gibson Road               1              100              88101
72114: Woodland-Gibson Road               1              100              88101
72115: Woodland-Gibson Road               1              100              88101
72116: Woodland-Gibson Road               1              100              88101
             AQS_PARAMETER_DESC CBSA_CODE
    1: PM2.5 - Local Conditions     41860
    2: PM2.5 - Local Conditions     41860
    3: PM2.5 - Local Conditions     41860
    4: PM2.5 - Local Conditions     41860
    5: PM2.5 - Local Conditions     41860
   ---                                   
72112: PM2.5 - Local Conditions     40900
72113: PM2.5 - Local Conditions     40900
72114: PM2.5 - Local Conditions     40900
72115: PM2.5 - Local Conditions     40900
72116: PM2.5 - Local Conditions     40900
                                     CBSA_NAME STATE_CODE      STATE
    1:       San Francisco-Oakland-Hayward, CA          6 California
    2:       San Francisco-Oakland-Hayward, CA          6 California
    3:       San Francisco-Oakland-Hayward, CA          6 California
    4:       San Francisco-Oakland-Hayward, CA          6 California
    5:       San Francisco-Oakland-Hayward, CA          6 California
   ---                                                              
72112: Sacramento--Roseville--Arden-Arcade, CA          6 California
72113: Sacramento--Roseville--Arden-Arcade, CA          6 California
72114: Sacramento--Roseville--Arden-Arcade, CA          6 California
72115: Sacramento--Roseville--Arden-Arcade, CA          6 California
72116: Sacramento--Roseville--Arden-Arcade, CA          6 California
       COUNTY_CODE  COUNTY year PM2.5      lat       lon
    1:           1 Alameda 2002  25.1 37.68753 -121.7842
    2:           1 Alameda 2002  31.6 37.68753 -121.7842
    3:           1 Alameda 2002  21.4 37.68753 -121.7842
    4:           1 Alameda 2002  25.9 37.68753 -121.7842
    5:           1 Alameda 2002  34.5 37.68753 -121.7842
   ---                                                  
72112:         113    Yolo 2022   3.8 38.66121 -121.7327
72113:         113    Yolo 2022   6.0 38.66121 -121.7327
72114:         113    Yolo 2022  34.8 38.66121 -121.7327
72115:         113    Yolo 2022  23.2 38.66121 -121.7327
72116:         113    Yolo 2022   1.0 38.66121 -121.7327
dim(combined_data)
[1] 72116    21

The dimensions of 72,116 observations represents the addition of the 15,976 observations from the 2002 data with the 56,140 observations from the 2022 data.

Step 3. Leaflet plot

#Color code years
year.pal <- colorNumeric(c('lightpink','mediumaquamarine'), domain=combined_data$year)

leaflet(combined_data) |>
  addProviderTiles('CartoDB.Positron') |>
  addCircles(lat=~combined_data$lat, lng=~combined_data$lon,  label = ~paste0(round(combined_data$PM2.5,2)), opacity=0.5, fillOpacity = 1, radius = 200, color = ~year.pal(combined_data$year)) |>
  addLegend('bottomleft', pal=year.pal, values=combined_data$year,
          title='year', opacity=1)

Compared to the 2002 location of sites, in 2022 there seemed to be an expansion of sites along the Coastal border and in central CA. Previously, the 2002 sites were along the major cities and Eastern border.

Step 4. Check for missing data

table(is.na(combined_data$PM2.5))

FALSE 
72116 

After running the test above I can conclude that there are no missing values for PM2.5. The potential implausible data might include the 302.50 PM 2.5 concentration seen in the max value of the 2022 data.

Step 5. Three Different Spatial Levels

2002 and 2022 Data at State Level

hist(twozero$PM2.5, breaks=100)

hist(twotwo$PM2.5, breaks=100)

summary(twozero$PM2.5)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    7.00   12.00   16.12   20.50  104.30 
summary(twotwo$PM2.5)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  -2.20    4.20    6.90    8.52   10.80  302.50 
twozero_summaryPM2.5 <- summary(twozero$PM2.5)
twotwo_summaryPM2.5 <- summary(twotwo$PM2.5)

median_val0 <- twozero_summaryPM2.5["Median"]
q10_val <- twozero_summaryPM2.5["1st Qu."]
q30_val <- twozero_summaryPM2.5["3rd Qu."]


median_val <- twotwo_summaryPM2.5["Median"]
q1_val <- twotwo_summaryPM2.5["1st Qu."]
q3_val <- twotwo_summaryPM2.5["3rd Qu."]

# Create a boxplot using the extracted summary statistics
boxplot(
  median_val0,  # Median
  q10_val,      # 1st Quartile
  q30_val,      # 3rd Quartile
  main = "Summary Boxplot of 2002",
  names = c("Median", "1st Quartile", "3rd Quartile"),
  ylab = "Values"
) 

boxplot(
  median_val,  # Median
  q1_val,      # 1st Quartile
  q3_val,      # 3rd Quartile
  main = "Summary Boxplot of 2022",
  names = c("Median", "1st Quartile", "3rd Quartile"),
  ylab = "Values"
)

boxplot(twozero$PM2.5,
        main = "Boxplot 2002",
        xlab = "X-axis Label",
        ylab = "Y-axis Label",
        col = "lightblue")

boxplot(twotwo$PM2.5,
        main = "Boxplot 2022",
        xlab = "X-axis Label",
        ylab = "Y-axis Label",
        col = "lightgreen")

From a state level, we can see that the 2022 median and averages have been shown to decrease compared to 2002. However, we can also see that in 2022 there were many more outlier values that far exceeded the maximums seen in 2002.

2002 and 2022 Data at County Level - Los Angeles County Code 37

# Data for LA County 2002
LACOUNTY <- subset(twozero, COUNTY == 'Los Angeles')

# Plotting a histogram
hist(LACOUNTY$PM2.5, main = "Histogram of PM2.5 in Los Angeles", xlab = "PM2.5")

boxplot(LACOUNTY$PM2.5,
        main = "Boxplot LA County 2002",
        xlab = "X-axis Label",
        ylab = "Y-axis Label",
        col = "darkgreen")

summary(LACOUNTY$PM2.5)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.60   11.10   17.40   19.66   25.50   72.40 
LACOUNTY$DATENUM <- as.Date.character(LACOUNTY$Date)

plot(LACOUNTY$DATENUM,LACOUNTY$PM2.5, type = 'l')

# Data for LA County 2022
LACOUNTY2022 <- subset(twotwo, COUNTY == 'Los Angeles')

# Plotting a histogram
hist(LACOUNTY2022$PM2.5, main = "Histogram of PM2.5 in Los Angeles in 2022", xlab = "PM2.5")

boxplot(LACOUNTY2022$PM2.5,
        main = "Boxplot LA County 2022",
        xlab = "X-axis Label",
        ylab = "Y-axis Label",
        col = "aquamarine")

summary(LACOUNTY2022$PM2.5)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  -1.20    7.40   10.30   10.97   13.70   56.00 
LACOUNTY2022$DATENUM <- as.Date(LACOUNTY2022$Date)

plot(LACOUNTY2022$DATENUM,LACOUNTY2022$PM2.5, type = 'l')

As we can now see, there appears to be a greater amount of data recorded for 2022 in LA County as compared to 2002. In addition, when comparing the means and medians of PM 2.5, we can see that there is an overall trend of decrease in LA County daily concentration of PM 2.5 compared to 2002 (2022: 10.3 vs 2002: 17.4 for the medians). This continues to show that overall PM 2.5 concentration decreased in 2022 compared to 2002. Of note, in 2022 it looks like there was a spike in PM 2.5 during the summer months.

Site Name - Los Angeles-North Main Street

# Data for LA SITE2002
twozero$SITENAME <- twozero$`Site Name`

#Subset Data
LASITE0 <- twozero[twozero$SITENAME == 'Los Angeles-North Main Street',]

# Plotting a histogram
hist(LASITE0$PM2.5, main = "Histogram of PM2.5 in Los Angeles North Main Street", xlab = "PM2.5")

boxplot(LASITE0$PM2.5,
        main = "Boxplot LA SITE 2002",
        xlab = "X-axis Label",
        ylab = "Y-axis Label",
        col = "mediumaquamarine")

summary(LASITE0$PM2.5)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   3.90   13.90   19.30   21.97   26.90   66.30 
LASITE0$DATENUM <- as.Date.character(LASITE0$Date)

plot(LASITE0$DATENUM,LASITE0$PM2.5, type = 'l')

# Data for LA SITE2022
twotwo$SITENAME <- twotwo$`Site Name`

#Subset Data
LASITE22 <- twotwo[twotwo$SITENAME == 'Los Angeles-North Main Street',]

# Plotting a histogram
hist(LASITE22$PM2.5, main = "Histogram of PM2.5 in Los Angeles North Main Street 2022", xlab = "PM2.5")

boxplot(LASITE22$PM2.5,
        main = "Boxplot LA SITE 2022",
        xlab = "X-axis Label",
        ylab = "Y-axis Label",
        col = "mediumpurple")

summary(LASITE22$PM2.5)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  2.400   8.325  10.900  11.583  14.300  38.000 
LASITE22$DATENUM <- as.Date.character(LASITE22$Date)

plot(LASITE22$DATENUM,LASITE22$PM2.5, type = 'l')

Consistent with the previous patterns observed, we can see that this LA site showed a decrease in PM 2.5 daily concentrations in 2022 compared to 2002. Similar to the trends earlier, the median was 10.90 in 2022 and 19.30 in 2002. As we see the overall trends in the histogram, in 2022 there were outliers not spread as far out as seen in 2002.